LOW COMPLEXITY DAMPED GAUSS-NEWTON ALGORITHMS FOR 
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Abstract. The damped Gauss-Newton (dGN) algorithm for CANDECOMP/PARAFAC (CP) decomposition can 
handle the challenges of collinearity of factors and different magnitudes of factors; nevertheless, for factorization of 
an N-D tensor of size I\ X . . . X If/ with rank R, the algorithm is computationally demanding due to construction of 
large approximate Hessian of size (RT X RT) and its inversion where T = /„. In this paper, we propose a fast 
implementation of the dGN algorithm which is based on novel expressions of the inverse approximate Hessian in 
block form. The new implementation has lower computational complexity, besides computation of the gradient (this 
part is common to both methods), requiring the inversion of a matrix of size NR 2 X NR 2 , which is much smaller 
than the whole approximate Hessian, if T 2> NR. In addition, the implementation has lower memory requirements, 
because neither the Hessian nor its inverse never need to be stored in their entirety. A variant of the algorithm 
working with complex valued data is proposed as well. Complexity and performance of the proposed algorithm is 
compared with those of dGN and ALS with line search on examples of difficult benchmark tensors. 
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1. Introduction. Algorithms for canonical polyadic decomposition, also coined CAN- 
DECOMP/PARAFAC (CP), can work well for general data EHIQI). However, they often 
fail for data with factors of different magnitudes 1201 or collinear factors such as bottle- 
necks and swamps. Bottlenecks arise when two or more components are collinear [6,9], and 
swamps arise when collinearity exists in all modes ||6][T7]. Alternating least squares (ALS) 
algorithms with line searches, regularization, and rotation can improve performance, but they 
do not completely solve the problems. The damped Gauss-Newton (dGN) or Levenberg- 
Marquardt (LM) algorithm has been confirmed to successfully decompose such difficult 
data 11 1II19J42T1I29II31I1 . However, because these methods require the inverse of a large-scale 
approximate Hessian matrix, the dGN algorithm is not applicable to real-world large-scale 
and high-dimensional data. In this paper, we establish a fast inverse of the approximate Hes- 
sian for low-rank tensor factorization by proving that the approximate Hessian for low-rank 
tensor factorization is a low-rank adjustment to a block diagonal matrix, and propose fast 
dGN algorithms that do not need to store the approximate Hessian and its inverse entirely at 
one time. 

The paper is organized as follows. Notation and basic multilinear algebra are briefly 
reviewed in Section [2] CP model and common algorithms are shortly reviewed in Section [3] 
Section [4] derives the fast dGN algorithm. Low -rank adjustment of approximate Hessian is 
derived, and its fast inverse is deduced in this section. The fast dGN algorithm with two 
variants has been proposed in Section 14.21 The fast dGN is extended to complex-valued 
tensor factorization in Section In Section [6] we provide examples illustrating the validity 
and performance of the proposed algorithms. Finally, Section|7]concludes the paper. 

2. Tensor notation and CANDECOMP/PARAFAC (CP) model. We shall denote a 
tensor by bold calligraphic letters, e.g., A. e R'l^x-xfe^ matrices by bold capital letters, e.g., 
A =[fli,fl2> • • • iUr] e R /xS , and vectors by bold italic letters, e.g., aj or / = ...,/#]• 
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Mode-n tensor unfolding of y is denoted by Y ( „). Generally, we adopt notation used in 
ll5l fT4ll . The Kronecker, Khatri-Rao (column-wise Kronecker) and Hadamard products and 
are denoted respectively by 0, 0, ©, ll5l fT4ll . 

Notation 2.1. Given N matrices A'"' e R ,nXR , we consider the following products 

N 

® A (n) = A (N) © • • ■ © A (,,) © ■ • • © A (1) , /„ = /, Vn, 
n=i 

®A (k) = A (N) © ■ ■ • © A (n+1) © A (B_1) © ■ • • © A (1) , I„ = /, Vn, 

OA (t) = A (N) o ■ ■ ■ o A ( " +1) o A ( "- 15 • ■ ■ A (1) . 

Definition 2.1. (Partitioned matrix and block matrix) A partitioned matrix U of N 
matrices U™ along the mode-2 (horizontal) is denoted by 

U = [U« ••• U w ••• U^] = [u w f_ 1 , (2.1) 
and a partitioned matrix V of NM matrices V ( "' m ' along two modes is denoted by V = 

r , slN,M . 

|V U! ' m 'J i .A block diagonal matrix B of N matrices U w is denoted by 

U(1) 1 N 

= blkdiag(U (1) ,--- ,U (A °) = blkdiagfu*"') (2.2) 

■ ^ {N) \ I \ ln=\ 



B 



Definition 2.2. (CANDECOMP/PARAFAC (CP)) A CPD consists in representing a 
given N-th order data tensor y e R / i >t fe x "" x7 » by a set of N matrices (factors): A (jl) = 
[af\ af, . . . , af] e R ! - xR , (n=l,2,...,N) IMMU2I such that 

y^fl* 1) o fl f»o...o fl f)4, (2.3) 

where symbol "o " denotes outer product. Tensor \) is an approximation of the data tensor \}. 
We often assume unit-length components \\a^ |b = 1 for n = 1,2, ...,N— 1, r = 1,2,...,/?. 

3. CP Algorithms. The Alternating Least Squares (ALS) algorithm ll2H4] [T0ll33l se- 
quentially updates A (,l) using the update rule given by 

A w = Y ( „)JpA w J(r (n) ) T , (n= 1,2 iV). (3.1) 

where T (n) = ® kt „ C w , C (n) = A (n) 1 A (n) (n = 1,2,..., AO is defined as in NotationO "t" 
denotes the pseudo-inverse. 

Denote by a e R RT , T = /„, concatenation of vectorizations of A™, n = 1, 2, . . . , N, 



a = 



vec(A (1) ) r ■■• vec(A ( ">) r ••■ vec(A w ) 7 



(3.2) 



All-at-once algorithms such as the OPT algorithm [1|, the PMF3, damped Gauss-Newton 
(dGN) algorithms [ 1 1 ,20 29 3 1 1 simultaneously update <X. The dGN algorithm is given by 

a + (H + pl RT y l g, (3.3) 
H = J r J, s = J r vec(£). (3.4) 

where £ = ^ - y, J € R JxRT , (J = Y[„ h) is the Jacobian of vec('y) with respect to a, H 
denotes the approximate Hessian, and the damping parameter p > 0. Paatero [20| empha- 
sized advantage of dGN compared with ALS when dealing with problems regarding swamps, 
different magnitudes of factors. 



Low Complexity Damped Gauss-Newton Algorithms for CANDECOMP/PARAFAC 



3 



The Gauss-Newton (GN) algorithm can be derived from Newton's method. Hence, the 
rate of convergence of the update rule (13.3) is at most quadratic. However, these methods 
face problems involving the large-scale Jacobian and large-scale inverse of the approximate 
Hessian H = J T J e R RTxRT . In order to eliminate the Jacobian, Paatero lEOl established 
explicit expressions for submatrices of H. We note that inverse of H is the largest workload 
of the GN algorithm with a complexity of order 0(R 3 T 3 ) besides the computation of the 
gradient g. Paatero [20] solved the inverse problem H 1 by Cholesky decomposition of the 
approximate Hessian and back substitution. However, the algorithm is still computationally 
demanding. Tomasi [29] extended Paatero's results [20 1, and derived a convenient method 
to construct H and the gradient for N-way tensor without using the Jacobian. In order to 
cope with the inverse of H, Tomasi 13011 used QR decomposition. However, the efficiency of 
existing dGN algorithms are still not sufficient for the large-scale problems due to the inverse 
H 1 

Recently, Tichavsky and Koldovsky |24l have proposed a novel method to invert the 
approximate Hessian based on 3R 2 x 3R 2 dimensional matrices. For low-rank approximation 
R <s /„, Vn, this method dramatically improves the running time. However, the algorithm still 
demands significant temporary extra-storage, and it is restricted for third-order tensors. 

4. Fast damped Gauss-Newton algorithm. In this section, we will derive a fast dGN 
algorithm for low-rank approximation of tensors with arbitrary dimensions. The most impor- 
tant challenge of the update rule ( 13.3) is to reduce the computational cost for construction of 
the approximate Hessian H and its inverse. 

Theorem 4.1 (Fast dGN algorithm). Define matrices T { "' m) of size (RxR), n = l,2,...,N, 
m — 1,2, ... ,N, and a partitioned matrix K of size ( NR 2 X NR 2 ) comprising matrices K*"' m) 



j-i(«,m) 



|^p(n,m)j 7 ' _ ^(m,n)| r _ (g) Q(k) ^ Q(n) _ ^(n)T a (b) g 



k±n,m 



(l-<J„, m )P s diag(vec(r ( "' m) )) 



e R R - xR \ n 



,N,m = 1, 



(4.1) 

,N, (4.2) 



where 5„, m is the Kronecker delta, P/ y is a permutation matrix for any I X J matrix X such 
that P/, y vec(x r ) = vec(X), P R = P S>R and T <n) = T M . 

For NR <K T, the fast dGN algorithm is written for each factor A (n> as follows 



aw «- aw + aw (\ R - ( F „ + rw) r|f) 



1,2, 



,N. 



(4.3) 



where A? is a variant of the ALS update rule ( li.il ) with a damping parameter fi > 0, F„ of 



A(«) 



size (RxR) are frontal slices of "ZF whose vec(3 r ) = w^, and 

(I^+A)" 1 . 



~(«) 

r 



B 



for invertible K, 



, otherwise, 
*¥ M = blkdiag (f^ ® C (n) ) e S 

V ln=\ 



aw^ a w_ rr 



B„ € 



r = ©c (n) . 

n=l 



(4.4) 
(4.5) 
(4.6) 

(4.7) 
(4.8) 
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Fig. 4.1. Illustration of the approximate Hessian for a 5-D tensor which can be expressed as a low rank 
adjustment H = G + ZKZ r ajin Theorem \4.2\ Green dots indicate nonzero elements. 



In order to prove Theorem 14. II we derive a low rank adjustment for H and employ the 
binomial inverse theorem 11131 to invert a smaller matrix of size NR 2 x NR 2 instead of H 

4.1. Fast inverse of the approximate Hessian H. 

Theorem 4.2 (Low rank adjustment for the approximate Hessian H). With K defined in 
Theorem \4.1\ the approximate Hessian H can be decomposed into 

H = G + ZKZ r , (4.9) 

G^lkdiaglT* ®^^ e R RTxRT , (4.10) 

V n / n- 1 

Z = blkdiagfl*® A w f e r RTxNr2 . (4.11) 

V rn=\ 

Proof of Theorem l4.2l is given in Appendix IB] whereas an example of H for a 5-D tensor of 
size 3x4x5x6x7 composed by 5 factors each of which has 3 components is illustrated in 
Fig. 14. II In the left hand side of Fig. |4~T1 H consists of (N(N - 1))R 2 rank-one matrices and 
NR 2 diagonal matrices which are located along its main diagonal. 

Theorem 4.3 (Fast inverse of the damped approximate Hessian). Inverse of the damped 
approximate Hessian H^ = H + p Irt can be computed through 

H^ 1 =0^-1^1^, (4.12) 

where is an NR 2 X NR 2 matrix defined in j4.6\) and 

= blkdiag (fl"' ® I,„) e R RTxRT , (4.13) 
L, = blkdiag (if ® A (n) f e R RTxNr2 . (4. 14) 



The matrix K can also be expressed as a partitioned matrix of matrices D ( "'"'' = (1 - 
(5„ ;m )diag(vec(r ( "- m) )) e R R2xS2 

K = (L N ®P R ) [D ( "' m) l . (4.15) 

If all the entries y ™'™ of r ( "' m) are non-zeros, the matrix D is invertible, and its inverse is 
also a partitioned matrix comprising diagonal matrices. Inverse of K is briefly described in 
Appendix|E] 

An alternative expression H^ 1 can be written in block form. 

Theorem 4.4 (Fast inversion of H^ in the block form). Inverse o/H^ can be written as 
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■fir • 


jj(l,m) 






fir • 


Tx(«,m) 








5(W 


SWA') 

n /j J 



(4.16) 



where 



8 n , m (?f I/„ ) - (Ir A«) S<"' m) (l« A< m > r ) , (4. 17) 

andlf m) = (rj? I R )B<"' m) (f^ Ij?) are matrices of size R 2 X fl 2 . 

Proof. From ( flTHl denote by B^ m) the (m, n)-th block of B^, we have 

^ } = S n.m ® I/, ,) - (f J° I/„) (i* A<">) B<*»> (l«0A^ r )(ff I 4 ) 

= «5„, m (f f I, ) - (l R A^) (F W I S ) B** (C ® I*) (I* ® A« r ) . 

Please note that the inversion of H A , in the block form saves memory. It requires to save only 

the matrices r„ and S^,. While the full matrix H or its inverse has R 2 T 2 elements, the memory 

saving format only requires to store NR 2 elements of matrices T and N 2 R 4 elements of S^,. 




4.2. Proof of Theorem gjj We replace H^ 1 in (O by those in <l4"J2t in Theoreml43 
or Theorem l4.4l and formulate the fast dGN algorithm 



a + G^g - Lf, B^ g 



(4.18) 



The Jacobian, which may demand high computational cost, still exists in the gradient g in the 
update rule (14.181 1. We also note that is a block diagonal matrix of N Kronecker products 

given in ( 14.141 1. Construction of L A , has a computational complexity of 

order o(t /? 3 ), and requires an extra-storage of o(77? 3 ). In order to completely bypass the 
Jacobian J in (14.181 1 and avoid building up the matrix L„, we seek convenient methods for 



computing G M g, = g, and product L^, B^, 

Lemma 4.5 (Optimize the update rule (14.18b ). With A^ n) , T and the tensor 2F defined in 
Theorem\4.1\ 



(Gfj g) 1 



vecfAW-AWrWf^) 7 



»V = L J g = vec 



L M B^ w M 



A wr A« - rif 



(A^F.lf) 



N 
n=l 



(4.19) 
(4.20) 

(4.21) 
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Algorithm 1: Fast Algorithm for Low-Rank Approximation 
Input: ^: input data of size I\ X I2 X • ■ ■ X 
R: number of basis components 
Output: N factors A (n) e R 7 " xS . 

begin 

Random or SVD initialization for A^, Vn 
repeat 
*V = [] 

for n = 1 to N do 

for m = n + 1 to N do % K in Eq . (TOl) 

K («,m) _ K (,«,n) _ Ps diag (vec(r ( "' m) )) % T ( "- m) = © C* C M = A { " )T A M 

A««-Y w (pA«))f^ 

7 



% damped ALS factor 



/ A («)r A (n)_ TT ^J 



<") = r (n) a c (n) 
1 



«p™ = r;-®c 



% Eq. dTJ72T) 
% T p = blkdiag M°) in Eq. d477b 

/ = (Kr 1 + x pX W,, % or / = k(I + Y |1 k)~ 1 w, 1 in Eq. (EE) 

forn = 1 foN do % Update A (,,) using Eq. (TOl) 

A« «- A w + A« (l s - (F„ + rW) ff ) % vec(S) = / 

Normalize A (n) , n=l,2,...,N 
Update yU 
until a stopping criterion is met 



Proof of Lemma |431 is given in Appendix iDl By replacing G^g, L^g, and L^B^Hy, in 
(14 . 1 8b by those in ( 14.19b . (14.20b and ( 14.21b . we obtain a compact update rule for each factor 
A (n) , n = 1, 2, . . . , N as given in Theorem l4.ll 

We note that linear systems B^, in ( 14.6b have a computational complexity of order 
0(W 3 R 6 ) which is much lower than 0(R 3 T 3 ) for (H + p I)" 1 for NR <K T. Pseudo code of the 
proposed algorithm based on the update rule (14. 3b is given in AlgorithmQ] If components of 
A (n) are mutually non-orthogonal, K is invertible, and its inverse can be explicitly computed 
as in Appendix [E] In this case, Step [3] is replaced by (IE. lb . A practical normalization in 
Step[10]is that the energy of the components is equally distributed in all modes. The method 
often enhances the convergence speed of the LM iteration | 



4.3. Two variants of the fast dGN algorithm. From ( 14. 6b , we present two variants of 
the fast dGN algorithm which solve the corresponding inverse problem O^'ny. 
(a) fLM a . <D = 4>i = l NR i + T^K comprises N diagonal matrices I S 2, and N(N - 1) block 

matrices PfiD'"' m \ for n + m. Note that 4>i is not symmetric, and its 

density is given by 

N(N-1)R 4 +NR 2 (N-1)R 2 + 1 
^ = W¥ = • (4 ' 22) 
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Fig. 4.2. Illustration of structure ofNR 2 X NR 2 sparse matrices Oi and <5>2for «3x4x5x6x7 dimensional 
tensor composed by R = 3 rank-one tensors. The matrix <S>\ is less sparse than the matrix <&2- Blue dots denote 
nonzero entries. 



For 3-D tensor factorizations, the fast dGN algorithm in which Step [8] solves Oj'w^ 
simplifies into the LM- 1 algorithm in 11241 . 
(b) fLM b . O = 2 = K 1 + "P^ is a symmetric matrix of size AW 2 x NR 2 derived from 
( 14.21 ) and ( 14.71 i. Theorem IE. 1 1 presents an explicit form of K 1 which is a partitioned 
matrix of (R 2 x R 2 ) diagonal matrices. Hence, it has only A^ 2 R 2 non-zero entries. The 
block diagonal matrix ( 14.7| > is constructed from N (R 2 x R 2 ) sub-matrices. As a 
consequence, the density of the sparse matrix O2 e xNR is 

N 2 R 2 +NR 4 -NR 2 R 2 +N-l 

d® 2 = z — ": = = . (4.23) 

N 2 R 4 NR 2 

Because Oi is not symmetric and less sparse than O2, solving the linear system Oj w M 
could be more time consuming than solving 2 1 wy Inverse of K is not expensive and has the 
explicit expression given in Theorem lE.il However, when the factor matrices have mutually 
orthogonal columns, K is singular because it has collinear columns and rows. In Fig. 14. 21 we 
illustrate the structures and properties of the two matrices Oj and <D 2 for a3x4x5x6x7 
dimensional tensor composed by R = 3 rank-one tensors. 

4.4. Comparison of complexity between dGN and fast dGN. In general, the dGN al- 
gorithm [20 29 1 constructs the whole approximate Hessian of size RTxRT from its submatri- 
ces H ( " " ,) (see Appendix|B]) which are deduced from C (,,) and r (n) . Computation of C (,l) and 
r (n) are with of complexity o(R 2 T^j and o(NR 2 ^, respectively. According to Theorem IB. 21 

each off-diagonal submatrix has a complexity of O (R 2 I n I„^, it follows that computation of 

the whole H has the complexity of o(R 2 T 2 ^. Note that H has R 2 T 2 elements. Inverse H~' 

can be computed with a complexity of o(R 3 T^. The gradient g is computed at a cost of 

O(NRJ). Thus dGN has a complexity per iteration of o(NRJ + R 3 T 3 ). 

Complexity of the fLM algorithm is analyzed for each step in Algorithm[T]as follows 
Step|3] computes N matrices C (n) and r ( '° with complexity o(r 2 t) and o(nR 2 ) as in dGN. 

Hence, building up K is of complexity o(N(N - l)(N - 2)R 2 ) = o(n 3 R 2 ). 
StepH inverts , n = 1, 2, . . . , N at a cost of O (NR 3 ). 
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Step [5] computes the damped factors A (,l) at a cost of O(NRJ), and is one of the most expen- 
sive steps in the fast dGN algorithm. We note that the large workload Y(„) o A v ' is 

kj=n 

used for evaluation of gradient, and exists in all CP algorithms such as ALS, OPT. 
Step|7] builds up the block diagonal matrix ^ with a complexity 0(NR 4 ^. 
StepH] solves the inverse problem <D _1 w^ with a cost of O^ 3 /? 6 ). This step is much faster 

than inverse of the approximate Hessian 0\R 3 T 3 J due to R <K /„ or NR < T. 

Instead of construction of the approximate Hessian, the fLM algorithm builds up the 
much smaller matrix O of size NR 2 x NR 2 . Hence, besides the cost of computation of the 
gradient or the damped ALS factors, fLM computes O and O 1 at a cost of o(R 2 T + N 3 R 6 ^j 
which is much smaller than the cost for construction of H and for H 1 in dGN. 

The total expense of fLM per one iteration is approximately O (NRJ + A^ 3 /? 6 ). For N >1 , 
the proposed algorithm has the same order of complexity as that of ALS. However, fLM is 
much faster than ALS because it requires less iterations than ALS. 

4.5. Damping parameter in the LM algorithm. The choice of damping parameter /i 
in the fast dGN algorithms (14. 3t affects the direction and the step size Aa = H^ 1 g in the 
update rule (13.31 i: a <— <X + Aa |fl~8). In this paper, the damping parameter p. is updated using 
the efficient strategy proposed by Nielsen (18): 



2max{-,l-{2p-iy\, p > 0, 



2p, 



ll«Mll!-lk»ll! 

P = TTT, '- 



Aa 1 (g + yu Aa) 



g = J T (y-y) = 



otherwise, 



vec|Y (1) (Oa w | -A«T« 



vec|Y ( Ao ©A« -A»r« 



(4.24) 
(4.25) 

(4.26) 



where e, = vec(^ - 9r), the gradient g can be straightforwardly derived as in (1D.H or in 
||29ll3TI . The factors A (n) will be updated unless the new approximate is lower than the 
previous one: \\e t \\2 < Ikz-ilb- The algorithm should stop when yu increases to a sufficiently 
large value (e.g., 10 30 ). In practice, the factors A ( "' are often initialized using the mode-n 
singular vectors of the data tensor J5]|7][14], then run over ALS d3.lt after few iterations. 
According to the CP model d2.3t , all the components a^'Un + N) except ones of the last 
factor are unit-length vectors. The initial value of the damping parameter p is chosen as the 
maximum diagonal entry of H as 

ju = rmax {diag (H)} = r max {diag (r (1) ) ■ • • diag (r (n) ) ■ • • diag (r (A °)} 

= rmax{l,diag(C (A °)} , (4.27) 



where t is typically in the range of [10 8 , 1]. 

5. Complex-valued tensor factorization. This section aims to extend the dGN algo- 
rithms to complex-valued tensors. Although a real-valued tensor is considered as a complex- 
valued tensor with zero imaginary part, for simplicity algorithms for real- and complex-valued 
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tensors are introduced in two separate sections. For the complex case, CP model is to find 
complex- valued factors A (n) e C 7 " xS . 

The damped Gauss-Newton-like update rule (13.3b is rewritten to update complex-valued 
factors Il8ll23l 



a<-a + (j H j + /d) 1 J 77 (3/ -J): 



(5.1) 



where symbol "H" denotes the Hermitian transpose, and the Jacobian J is given in dB.lt . The 
approximate Hessian H = J 77 J slightly changes from that for the real-valued tensors. A fast 
and efficient computation method for the complex-valued approximate Hessian H will be pre- 
sented so that the final update rule does not employ both of the Jacobian and the approximate 
Hessian. We consider H as a partitioned matrix of (N x N) sub-matrices H ( " ,m) e £RinXRi m ^ 
n,m — 1,2, . . . ,N . Each sub-matrix H ( " " !) is a partitioned matrix of (/? x /?) subsub matrices 
H^ m) e C 7 "* 7 '", n,m - 1,2, ...,N, r,s = 1,2,...,/?. The explicit expression of the approx- 
imate Hessian H is deduced from the following theorems which can be derived in a similar 
manner as for real valued tensors. 

Theorem 5.1 (Subsub-matrices H"f ). Hr"/" ; are diagonal or rank-one matrices given 

by 

H^ ffl) = 6„, m I 7 „ + (1 - 8 n , m ) r £ ffl) 4 n) « ( r )H , (5-2) 
where y^l are the (r, s) entries of the Hermitian matrices T^ n,m) — ® A^ H A^ k \ 

k+n,m 

Theorem 5.2 (Sub-matrices H ( "- m) ). With K defined as in A4.2[ , H ( "' m ' are expressed in an 
explicit form as 

H ( "' m) = 5„, m (r (n) ® I 4 ) + (l s ® A (n) ) K ( "- m) (l s ® A (m)77 ) . (5.3) 



Theorem 5.3 (Low-Rank Adjustment). For NR <K T, the approximate Hessian H = J 77 J 
can be expressed as a low-rank adjustment given by 



H = G + ZKZ H , 



(5.4) 



where sparse matrices G, Z and K are defined as in H4.10[ , H4.11[ and H4.2i . 

The damped Gauss-Newton algorithms for complex-valued tensor factorization are stated 
in following theorems: 

Theorem 5.4 (damped GN algorithm for complex-valued tensor factorizations). The fac- 
tors A (n) are updated using the rule given by 



a+ (H + ^ir 1 g, 



(5.5) 



where the approximate Hessian H is defined in Theorems \5. 1 l or l372l an Levenberg-Marquardt 
regularization parameter p > and the gradient g € C RT is computed as 

J-.N T 



g 



vec(Y w (OA«*J-A(">r<"> r ) 



(5.6) 



where symbol '*' denotes the complex conjugate. 

Theorem 5.5 (fast dGN for low rank approximation). For NR <K T, the factors A ( "^ are 
updated using the fast update rule given by 
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A w «_ A w + a« (i* - (f„ + r<"> r ) if) , 



(5.7) 



where F„ are frontal slices of a 3-D tensor 3^ whose vec(3 r ) = B^h^, B^, = 1 + lO j/K 
is invertible, or B^ = K + K) , andw^ is computed from the damped ALS factors A^"' 



IV = blkdiagjff <s> A ( " )H A W ) 



A („)tf A („)_ rf W 



(5.8) 
(5.9) 

(5.10) 
(5.11) 



6. Experiments - Computer simulations. The CP algorithms were verified for difficult 
data with collinear factors in all modes (swamp). Collinearity degree of factors was controlled 
by mutual angles between their components. Collinear factors A (n) were generated from 
random orthonormal factors U (n) 



af=uf + vuf>. 



ve(0,l],Vn,Vr* 1. 



Mutual angles 9 qj between af } and a " , q + r were in a range of (0, 60°] for v e (0, 1] 



tan(6U) 



9=1, 



' Vv 2 + 2, q+\,r. 



(6.1) 



(6.2) 



For example, v = 0.1,0.2, . . ., 1 yield U = 6°, 11°, 17°, 22°, 27°, 31°, 35°, 39°, 42°, 45°, and 
6 q>r = 8°, 16°, 23°, 30°, 37°, 43°, 48°, 52°, 56°, 60°, q+\,q + r, respectively. For high v such 
as v = 2, Q\ r w 63° and 9>r w 78°, tensor can be quickly factorized by CP algorithms. The 
higher the parameter v, the lower the collinearity of factors. It is more difficult to factorize 
tensors with lower v (e.g., v = 0.1, 0.2). However, when v > 3, another issue arises from large 
difference in magnitude between components. The tensors are still difficult to factorize even 
thought collinearity of factors is low (9\, r > 71°). CP tensors, as in (12. 3K can equivalently be 
constrained to be of the form 



y =Y,*raV oaf q... o«W 



(6.3) 



where \\a r . lb = l,Vr, and each A r encodes the magnitude. For this experiment A[ = 1, and 
A r = (1 +v 2 ) N/2 ,Vr > 1. Therefore, for v = 3,4,5 and N = 3, A r = 31.6,70.1, 132.6, Vr# 1, 
respectively. That means the components a" , r = 2, ...,R are relatively larger than the 
first component. We analyze synthetic tensors for two cases: error-free and noisy data with 

\\M II 2 

additive white Gaussian noise at SNR (= -10 log 10 -^rfff) = 30 dB or 40 dB added to the 

data tensor y = y + o\NT , where N denotes a normally-distributed random tensor of zero 
mean and unit variance whose «;,;,.../„ ~ N(0, 1). 
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In order to evaluate the factorizations for collinear data, we measured the Median Squared 
Angular Error (MedSAE) over multiple runs between the original and estimated components 
a " , "ay after matching their orders defined as 

MedSAECai."',^"') = 10 log 10 (median (a<" )2 )) (dB), (6.4) 
where cr'"' = arccos — % — . Cramer-Rao Induced Bound (CRIB) on o-l"' 2 was computed 

h\K lb 

from the Cramer-Rao lower bound (CRLB) for estimating the component ay B15II25I - I271 

tr Ul In - V*Vlla^I 2 )CRLB(a«)) 
CRIB(a<.» )2 ) = 101og 10 ^ -— '- (dB). (6.5) 

\K \\ 2 

For our simulations, due to the same collinearity degree v for all the components, we have 

CRIB(ff<" )2 ) = CRIB(<^ 1)2 ), Vr, Vn, 
CRIB(<^" )2 ) = CRIB(4" )2 ), V«, r = 2, . . . , R. 

The average MedS AEs for the estimated components were compared against the average 
CRIB. It is important to note that an MedSAE lower than -30 dB, -26 dB or -20 dB means 
two components are different by a mutual angle less than 2°, 3° and 6°, respectively. Practical 
simulations show that it is difficult for MedSAE to reach a CRIB > -30 dB, since collinearity 
of factors has been destroyed by noise. Discussion on effects of noise on collinear data in 
Appendix [F] gives us insight into when CP algorithms are not stable, and when they succeed 
in retrieving collinear factors from noisy tensors. 

6.1. Comparison between dGN and fLM for 3-D tensor factorizations. This sec- 
tion compares performance of fLM and the standard dGN algorithm in the Matlab routines 
PARAFAC3W developed by Tomasi M28II32L The dGN algorithm |28) computes the approx- 
imate Hessian and gradient, and employs Cholesky decomposition and back substitution to 
solve the inverse problems H ~ l g. Unfortunately, this toolkit supports only 3-D data. The 
fLM a algorithm was verified, and shortly denoted by fLM. 

In the first set of experiments, random synthetic tensors were generated from 3 collinear 
factor matrices of size I x R where / = 100 and R = 5, 10, 20, 30, 40, 60 and v =__0.5. From 
each noise-free CP tensor y composed from A (n) € R IxR , twenty noisy tensors y of 30 dB 
SNR were generated. There are in total 200 rank- R tensors y. MedSAE for each component 
was deduced from 200 runs for each test case. 

Both algorithms were initialized by the same factors which were the mode-n singular 
vectors of the data tensor [7|. Algorithms stop when 10 differences of successive relative er- 
_ ^||^. 

rors s = were lower than 10~ 8 , or until the maximum number of iterations (1000) 

iryiif 

was achieved. Execution time for each algorithm was measured using the stopwatch com- 
mand: "tic" "toe" of MATLAB release 2009a on a computer which had 2 quadcore 3.33 GHz 
processors and 64 GB memory. Tucker compression was not used in the simulations. The 
dGN in 11281 was adapted to follow the same stopping criteria and the same computational 
time measurements, while its other parameters were set to default values. 

Fig. |6. l(a)| visualizes the overall execution times in seconds and the average execution 
times per iteration for both algorithms. The speed-up ratios for the overall decomposition 
between dGN and fLM were approximately 6.4, 14.6, 35.1, 16.7, 7.8 and 2.8 times for R = 5, 
10, 20, 30, 40, 60 respectively, while the speed-up ratios per iteration were respectively 5.6, 
14.7, 20,7, 11.3, 6.5 and 2.7. We note that the numbers of iterations of dGN and fLM were 
slightly different because of differences between them in controlling the damping parameters . 
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(a) 



Overall execution time and average execution time per 
iteration. 



(b) MedSAE and CRIB 



Fig. 6.1. Comparison between the dGN (green lines) and fLM (magenta lines) algorithms for factorization of 
100 X 100 X 100 dimensional tensors composed by collinear factors for various R at SNR = 30 dB :\(a)\ the overall 
execution times in second (dashed lines) and the average execution times per iteration (solid lines );Vbj\the average 
MedSAE values (dB) of the first components a\ (square marker) and of other components a), (triangular marker), 
r = 2,...,R,n = 1,2,3. 



In Fig. |6.1(b)| we illustrate the average MedSAE values of dGN [28 1 and fLM. The mean 
MedS AEs for the first components af \ n = 1 , . . . , N were calculated over N MedSAE(a ( j" )2 ); 
whereas the mean MedSAEs for the other components a, , r - 2, 3, ...,R, n = 1, . . . ,N 
were calculated over (Nx(R- 1)) MedSAE(a£f 2 ). Fig |6.1(b)| shows that the average values 
of MedSAE(aS" )2 ), r > 2, V«, asymptotically attained the CRIB. It means that both dGN and 
fLM well reconstructed components ay, r = 2, ...,R, Vn even for R = 60. To be accurate, 
CRIB is a theoretical lower bound on the mean of the square angular error, not on the median. 
In these simulations, the median and mean SAEs appeared to be nearly identical so that only 
the former one is shown. 

(n) 

For the first components a\ , performances of dGN and fLM were equivalent in the sense 
of collinearity reconstruction for small R — 5, 10. For/? = 20, 30, fLM still reconstructed the 
first components. Note that although MedSAEs were different, the relative approximation 
errors s of two algorithms were almost the same but they were not presented here. The 
difference in component reconstruction was caused by implementation of the control strategy 
for damping parameter. For R > 40, the average MedSAEs of the two algorithms were much 
worse than the CRIB, and they were not able to reconstruct the first components. Indeed, we 
cannot recover the first components due to noise for high R. 

In order to analyze complexity of the two algorithms for higher ranks R I, we decom- 
posed tensors of the same dimensions whose entries were randomly generated. The rank R 
varied from 5 to / = 100. The amount of allocated memory and average execution time per 
iteration were measured on the computer (PCI) in the previous simulations and on a com- 
puter (PC2) which had 2.67 GHz i7 CPU and 4 GB of memory. The results were summarized 
in Fig. 16.21 For high rank R > 50, dGN required more than 4 GB of memory and could 
consume 20 GB of memory for R = 100 whereas fLM need less than 4 GB of memory. On 
PCI which had 64 GB of memory, fLM was slightly more time consuming for R > 90 than 
dGN because the advantage of the fast inversion in (14.6b was lost. However, dGN became 
dramatically time consuming on PC2 when R > 40. 
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R R 



(a) Allocated memory. (b) Execution time per iteration. 

Fig. 6.2. Memory requirements and execution time per iteration of dGN andfLM in approximation of 100 X 
100 X 100 dimensional tensors by rank-R tensors where R = 5, 10, 20, . . . , 100. 



6.2. Factorization of higher-order real-valued tensors. The proposed algorithms have 
been extensively verified and compared with the ALS algorithm plus line seach in the N-way 
toolbox [2 1, for 4-D tensors of size /„ = 50, various ranks R = 5, 10, 15, and with differ- 
ent collinearity degree v = 0.1,0.3,0.5,0.7,0.9. The 4-D tensors were corrupted by additive 
Gaussian noise at SNR — 40 dB. For each pair (v, R) MedSAE was computed from 400 
runs. Execution times (seconds) were measured on a computer that had 6-core i7 3.33 GHz 
processor and 24 GB memory. 

Algorithms were analyzed under the same experimental conditions as in the previous 
simulations. They iterated until successive relative errors s were lower than 10~ 12 , or the 
maximum number of iterations (5000) was achieved. The ALS algorithm plus line seach 
(ALSls) was adapted to have the same stopping criteria. 

At SNR = 40 dB and ranks R = 5, 10, 15, CRIBs are relatively high (> 40 dB) for most v 
(see Fig. |6.3(d)) . Hence, CPD algorithms easily estimated collinear factors and obtained high 
MedSAE comparable to the CRIB. Fig. |6.3(d)| shows that MedSAEs of ALSls and fLM were 
almost similar and approached CRIB except those for/? = 15 and v = 0.1. It should be noted 
that factorization became more difficult in the case of higher rank R and lower v. Execution 
times of algorithms for different R and v are illustrated in Figs. |6.3(a)|6~3(c)| The results 
indicate that the higher the collinearity degree (i.e., smaller v) the more time-consuming the 
algorithms. For example, ALSls on average ran 2083 iterations in 957 seconds to factorize 
4-D noisy tensors when R = 10 and v = 0.1. However, when keeping the tensor size and 
rank R and changing v = 0.9, this algorithm ran 34 iterations in 14 seconds. For the same 
tensors with v = 0.1, fLM took only 48.6 seconds on average to execute 384 iterations, and 
took 6 seconds for 21 iterations with v = 0.9. That means fLM was 21 times faster than ALS 
with v = 0.1. For 4-D tensors of R = 15 and with v = 0.1, ALSls ran 4225 iterations in 
2255 seconds on average, while fLM took only 103 seconds to execute 494 iterations. Hence, 
fLM was 24.7 times faster than ALSls for the difficult test case. More execution times and 
speed ratios are given in Table 16. ll Speed ratio between ALSls and fLM was high for highly 
collinear data (e.g., v = 0, 1). For example, fLM was at least 17.1 times and up to 24.8 
times faster than ALSls for collinear data with v = 0. 1 . For lower collinearity degree, ALSls 
quickly factorized the tensor after few iterations. Although the speed ratio decreased, fLM 
was still approximately 3 times faster than ALSls. 
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Fig. 6.3. Comparison between ALSls and fLM for factorizations of 4-D tensors of size 50 X 50 X 50 X 50 at 
SNR = 40 dB. \(a)\( c)\ execution times (seconds) were measured when algorithms factorized tensors of various ranks 
R = 5, 10, 15. \(d)\the average MedSAE (dB) for all components compared with CRIB. 



6.3. Factorization of complex-valued tensors. In the next set of simulations, we con- 
sidered factorization of complex- valued tensors. Factors A (n) e £70xR were generated in the 
same manner as for experiments in the previous section. However, they had random real 
and imaginary parts. In addition to collinearity degrees v = 0.1, 0.2, . . . , 0.5, we considered 
v = 3,4, 5. We note that although collinearity of factors is low for high v = 3, 4, 5 (0i $r > 71°), 
the tensors are still difficult to factorize. 

We compared fLM with ALS plus line search (ALSls). Algorithms stopped when differ- 
ences between successive relative errors were lower than 10~ 8 , or the maximum number of 
iterations (2000) was achieved. In Figs. |6.4(a)||(b)] we illustrate the average MedSAE of all 
factors for 70 x 70 x 70 x 70 dimensional tensors with ranks R = 5 and 15 over 200 runs. 
ALSls achieved good performance with v = 0.2, and excellent MedSAE with v = 0.3, 0.4 
and 0.5. However, for high collinearity degree v = 4 and 5, ALSls did not obtain perfect 
reconstruction. The fLM algorithm outperformed ALSls for all test cases. Figs. |6.4(c)|(d)| 
indicate that the number of iterations of ALSls tended to decrease gradually as v increased 
from 0.1 to 5. For v = 3,4,5, ALSls stopped after tens of iterations because there was not 
any significant change in the relative error. Figs. |6.4(c)|(d)| also reveal that fLM required less 
iterations for higher v. Difference in magnitude between components did not affect fLM. 
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Table 6.1 

Comparison of average execution times (seconds) between fLM and ALSls for factoriza tions of 4-D and 5-D 
tensors of size /„ = 50 at SNR = 40 dB composed by collinear factors with various v = 0.1,0.3,0.5,0.9 and for 
various R. For each pair (TV, I„,,R, v), speed-up ratio and execution times are given as indicated in the subtable at 
the bottom. 



Tensor's size 
(N-D, I m x R) 



Collinear degree v 



0.1 



0.3 



0.5 



0.7 



Execution timeALSls (seconds) 



Execution timefLM (seconds) 



0.9 



4-D, 50 x 5 


17.1 


347 
20 


11.1 


65 
6 


6 


28 
4.4 


3.9 


15 

3.9 


2.8 


11 

3.8 


4-D, 50 x 10 


21.2 


957 
49 


9.6 


90 
9 


4.9 


34 
7 


6 


40 
6 


2.5 


11 

6 


4-D, 50 x 15 


24.8 


2,201 
99 


15.4 


263 
16 


4.2 


48 
11 


3 


29 
10 


2.9 


29 
9 


5-D, 50 x 5 


22 


17,245 
790 


8.1 


2,747 
346 


4.6 


1,240 
453 


4.2 


821 
205 


3.4 


730 
251 
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Fig. 6.4. Illustration for MSAE for factorization of 4-D complex-valued tensors with size I„ = 70 and ranks 
R = 5, 15. Algorithms stopped as they reached a derivative of successive relative errors of 10~ 8 or 2000 iterations. 



7. Conclusions. Simulations for real- and complex-valued tensors confirmed the fLM 
algorithm was faster than dGN and ALS, and outperformed ALS in the sense of approxima- 
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tion accuracy (MedSAE) for difficult test cases. Moreover, MedSAE of fLM was comparable 
to CRIB for most test cases even for noisy tensors. For the collinearity modification used in 
the simulations, we also show that for the same tensor size and collinearity degree, the higher 
rank R the data tensor has, the more difficult the factorization is to retrieve the factor. For the 
same size /„, rank/?, and collinearity degree, the higher the dimensions of the data tensor, the 
higher the performance of factorization can be achieved. 

Most CP algorithms incorporated with line-search techniques work well for general data, 
but often fail for highly collinear data with bottlenecks or swamps. The dGN/LM algorithms 
112011291 can deal with such data, but demand extreme computational cost associated with 
large-scale inverse of approximate Hessians. In this paper, by employing the special structure 
of the approximate Hessian, a fast inverse for the approximate Hessian has been derived, and 
low complexity damped Gauss Newton algorithms have been proposed for factorization of 
low rank real- and complex-valued tensors. The proposed algorithm avoids building up the 
whole approximate Hessian and its inverse by working with much smaller matrices of size 
NR 2 x NR 2 instead of (RT x RT). Extensive experiments for tensor factorizations showed 
that our algorithms outperformed "state-of-the-art" algorithms for difficult benchmarks for 
both real and complex- valued tensors. The proposed dGN/LM algorithms can be extended to 
the nonnegative CPD in which factors are nonnegative matrices. Moreover, our algorithms 
can be simplified to estimate only one factor for supersymmetric tensor factorization which 
can be found in multiway clustering, or to the INDSCAL decomposition B51 1T51 . 

Acknowledgments. The authors wish to thank the referees for the very constructive 
and detailed comments and suggestions which led to major improvements in the manuscript. 
They also thank for Dr. Benedikt Losch and Mr. Austin Brockmeier for their suggestions that 
helped improving the manuscript. 

Appendix A. Commutation Matrices. A commutation matrix Q„ expresses connection 
between vectorizations of tensor unfoldings, and often exists in construction of the Jacobian 
J and the approximate Hessian H in dGN algorithms for CP and Tucker decompositions [22]. 

Lemma A. 1 . (mode-n to mode- 1 unfolding) Commutation matrix Q„ which maps vec(A(i)) = 
vec(^l) = Q„ vec(A (B) ) is given by Q„ = I /n+1:A , ® P/ l: „_,,/„, with h j = Y\ J k=i h- 

Appendix B. Proof of Theorem 14.21 

In order to prove Theorem 14. 2J we seek explicit expressions for the Jacobian and the 
approximate Hessian in the next section. 

Lemma B.l. The Jacobian matrix J has a form of A20IU71/ 

N 

(B.l) 

n=\ 

We express the approximate Hessian H as an N x N block matrix H = [EP"' m) ] , H ( "' m) 
of size RI n x RI m . 

Theorem B.2. (see also [20, 29]) A submatrix JSS n > m > has an explicit expression given by 
H ( "' ffl) = 6 n , m (r (n) a I,„) + (l s ® A (n) ) K ( "- m) (l« 9 A (m)T ) , Vji, Vm. (B.2) 

By establishing expressions for submatrices H ( "' m) , we can prove Theorem 14. 2 1 
Proof. (Theorem l4.2l i From (1B.2I I. we construct a sparse matrix G consisting all block 
matrices H (n) n = 1,2, ...,N, that is 

G = blkdiag(H ( " ) ) A ' = blkdiag(r (n) ® 1/ f . (B.3) 

V fn=\ \ n fn-\ 



Q, 0A%I 
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From Theorem IB. 2 1 and by using the product of block matrices, it is straightforward to 
decompose H - G into three matrices defined in Theorem |4.2| as 

H-G = ZKZ r . (B.4) 



Appendix C. Proof of Theorem 14.31 

Proof. The damped approximate Hessian = G + /Art + ZKZ r is adjusted from 
Gfi = G + fjlRT by a low-rank matrix ZKZ r . Hence, its inverse can be quickly computed by 
applying the binomial inverse theorem (see page 18 ifPJl ) 

, [g;, 1 - G" 1 Z (KT 1 + Z r G- 1 z) _1 Z T G~\ if K is invertible, 

H~ = I " v " I J " ' (C.l) 

(G; 1 -G^'ZK (I NR 2 + Z r G / ; 1 ZK) Z T G~\ otherwise. 

Denote by G^ inverse of the block diagonal matrix G^, which is also a block diagonal matrix 

G„ = (blkdiag((r w +^I fi )®I / „)^ = J ' = blkdiag (if ® I 4 ) ^ . 

Similarly, we denote = G^ 1 Z and = Z T G^ 1 Z. From ( 14.1 U and by taking into ac- 
count (if ® I 7 „J (l s ® A*' ) = if ® A (n) , we have 

= G^ 1 Z = blkdiag (if ® I/„ ^ blkdiag (l« ® A (n) )^ =i 

= blkdiag (if ®A (n) ) . 

V / n-l 

= blkdiag (if ® C (n) ) . (C.2) 

V /n=l 

Finally, we define B as in ( 14.61 l. and easily deduce ( 14.121 ) from (IC.lt . □ 
Appendix D. Proof of Lemma FOl 

Proof. From ( IB. 11 1. ( 14.13b . and note that vec(£) = Q„ vec(£(„)), where Q„ is defined in 
Lemma IaTI the product G^ g can be expressed in a block form as 

(G„ f vec(£)) r =[vec(£) r Q„ |0A«) if ) ® I In )T = Lc(e (j!) (© A«)rf ^ 

Jy w ( O a<*>) rf - aw ( O a®) ( O a«) if 
vi{kf -A«r w rf) r . 



(D.l) 



Similarly, a convenient formula to compute Lj g is given by 



*V = L; J y vec(£) = 



vec 



a w, a;'- it; 



(D.2) 
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Finally, for each frontal slice F„ of the tensor & e R RxSxA ' whose vec(9 r ) = B^h^, we have 
(f J° ® A"") vec(F„) = vec(A"" F„ fj°) . (D.3) 

From ( I4.14l i. we obtain ( I4.21l i. Each product inside (ID.31 > has a complexity of o(l„ R 2 + /? 3 ). 

Hence, L^f in d4~2TT ) has a complexity of o(TR 2 + NR 3 ) * o(77? 2 ) which is lower than 

0(TR 3 ) by a factor /? for building up and direct computation /. Furthermore, this fast 
computation does not use any significant temporary extra-storage. □ 

Appendix E. Inverse of The Kernel Matrix K. 

Theorem E.l. Inverse o/K defined in A4.2i is a partitioned matrix K = K 1 whose blocks 
K ( "'"'\ for n = l,.,.,N,m= 1, . . . , N are given by 

K ( "' m) = - <0 diag (vec(C (n) © C (m) t)) P r . (E.l) 



Appendix F. Effects of noise on collinear data. 

This section discusses briefly effects of noise on factorization of collinear tensor gener- 
ated by the modification d6.lt . Consider matrix factorization of the mode-n tensor unfolding 

Y ( „)=aw|OawJ +E (n) . (Fl) 

Analysis of singular values of Y(„) or eigenvalues of Y(„) Y^ allow predicting whether fac- 
torization succeeds in retrieving collinear factors from noisy tensors. This also gives insight 
into when CP algorithms are not stable, and yield non-unique solution. 



1 



It is straightforward to prove that E = 



The modification ( 16.11 ) can be expressed as A (n) = U (n) Q, where Q = 
R RxR . In theory, for noisy tensors ^ with /„ = /, Vn, we have 

Y w Yf B) = A (,,) r (,,) A (n) T + E w Ej n) = U (,!) E U w T + a 2 f~ x \ . (F.2) 

Q Q Q , W m denotes element -wise power, and 

2 | Will R 2 + (R-l)xy-l x _ 1+ ^ y _^-i (F3) 

^ iqSNR/10/JV 1()SNR/10/JV ' i+v,y x . i^r.j; 

R 2 +(R-l)(y-l) v(R + y-l)ll_, 

v(R+y -l)l R ^ (a: - 1) (1r_i + (y - 1)Ir-i) 
has (7? - 2) identical eigenvalues A r = (x - l)(y - 1), r = 2, ...,/?- 1, and its largest and 
smallest eigenvalues /li > /!,. > Ar are solutions of a quadratic equation 

= xy + (R-2)(R + x+y) + 3, (F.4) 
zliA* = (jc- l)(y- 1) = A r , 2<r<tf-l. (F.5) 

Fig. |F. 1 (a")| illustrates A r (r- 1, ...,R) for 3-D noiseless tensors with / = 100 and R = 15 
compared with the noise levels a 2 I N ~ ] at SNR = 20 dB and 30 dB. The higher the collinearity 
degree of factor, the smaller the eigenvalues A r . If eigenvalues A r are considerably lower than 
the noise level <x 2 I N ~ l , the factorization becomes infeasible, e.g., as v < 0.1. 

Because U (,,) are orthonormal, Y(„) YL has R leading eigenvalues A r - A r + cr 2 / (AL1) , r 

I R, and (I - R) eigenvalues A t = o- 2 I {N - l \i = R + 1, ...,/. In Fig. |F.l(b)| we plot 

eigenvalues Aj for noisy tensors having the same dimension as that of tensors illustrated in 
Fig. |F. l(a)| The largest eigenvalue Ai significantly exceeds the noise levels, whereas Ar is 
quite close to the noise level at SNR = 20 dB for v < 0.3, or at SNR = 30 dB for v < 0.1. 
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(a) Eigenvalues X r , r = 1, . . . ,R (= 15), /„ = 100, N = 3. (b) Eigenvalues % h i = !,...,/„(= 100), R = 15, JV = 3. 



Fig. F.I. Analysis of eigenvalues o/Y(„) Yj, for 3-D tensors of size I„ = 100 and rankR = 15. R leading eigen- 
values A r for noiseless tensors and A r (r = 1, . . .,R)for noisy tensors are compared with noise levels (green shading) 
at SNR = 20 dB and 30 dB. The more the eigenvalues are in the noise zone, the more difficult the factorization of 
noisy tensors to retrieve collinear factors become. 
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